{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Frequent Itemsets via Apriori Algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Apriori function to extract frequent itemsets for association rule mining"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> from mlxtend.frequent_patterns import apriori"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Apriori is a popular algorithm [1] for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store. An itemset is considered as \"frequent\" if it meets a user-specified support threshold. For instance, if the support threshold is set to 0.5 (50%), a frequent itemset is defined as a set of items that occur together in at least 50% of all transactions in the database."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "[1] Agrawal, Rakesh, and Ramakrishnan Srikant. \"[Fast algorithms for mining association rules](https://www.it.uu.se/edu/course/homepage/infoutv/ht08/vldb94_rj.pdf).\" Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994.\n",
    "\n",
    "## Related\n",
    "\n",
    "- [FP-Growth](./fpgrowth.md)\n",
    "- [FP-Max](./fpmax.md)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1 -- Generating Frequent Itemsets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `apriori` function expects data in a one-hot encoded pandas DataFrame.\n",
    "Suppose we have the following transaction data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
    "           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
    "           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],\n",
    "           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],\n",
    "           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can transform it into the right format via the `TransactionEncoder` as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Apple</th>\n",
       "      <th>Corn</th>\n",
       "      <th>Dill</th>\n",
       "      <th>Eggs</th>\n",
       "      <th>Ice cream</th>\n",
       "      <th>Kidney Beans</th>\n",
       "      <th>Milk</th>\n",
       "      <th>Nutmeg</th>\n",
       "      <th>Onion</th>\n",
       "      <th>Unicorn</th>\n",
       "      <th>Yogurt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Apple   Corn   Dill   Eggs  Ice cream  Kidney Beans   Milk  Nutmeg  Onion  \\\n",
       "0  False  False  False   True      False          True   True    True   True   \n",
       "1  False  False   True   True      False          True  False    True   True   \n",
       "2   True  False  False   True      False          True   True   False  False   \n",
       "3  False   True  False  False      False          True   True   False  False   \n",
       "4  False   True  False   True       True          True  False   False   True   \n",
       "\n",
       "   Unicorn  Yogurt  \n",
       "0    False    True  \n",
       "1    False    True  \n",
       "2    False   False  \n",
       "3     True    True  \n",
       "4    False   False  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from mlxtend.preprocessing import TransactionEncoder\n",
    "\n",
    "te = TransactionEncoder()\n",
    "te_ary = te.fit(dataset).transform(dataset)\n",
    "df = pd.DataFrame(te_ary, columns=te.columns_)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let us return the items and itemsets with at least 60% support:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(3)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>(5)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(6)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(8)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(10)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(3, 5)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(8, 3)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(5, 6)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(8, 5)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(10, 5)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(8, 3, 5)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support   itemsets\n",
       "0       0.8        (3)\n",
       "1       1.0        (5)\n",
       "2       0.6        (6)\n",
       "3       0.6        (8)\n",
       "4       0.6       (10)\n",
       "5       0.8     (3, 5)\n",
       "6       0.6     (8, 3)\n",
       "7       0.6     (5, 6)\n",
       "8       0.6     (8, 5)\n",
       "9       0.6    (10, 5)\n",
       "10      0.6  (8, 3, 5)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from mlxtend.frequent_patterns import apriori\n",
    "\n",
    "apriori(df, min_support=0.6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, `apriori` returns the column indices of the items, which may be useful in downstream operations such as association rule mining. For better readability, we can set `use_colnames=True` to convert these integer values into the respective item names: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>(Kidney Beans)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Milk)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Yogurt)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs, Kidney Beans)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Kidney Beans, Milk)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Kidney Beans)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Kidney Beans, Yogurt)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Kidney Beans, Eggs)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                     itemsets\n",
       "0       0.8                       (Eggs)\n",
       "1       1.0               (Kidney Beans)\n",
       "2       0.6                       (Milk)\n",
       "3       0.6                      (Onion)\n",
       "4       0.6                     (Yogurt)\n",
       "5       0.8         (Eggs, Kidney Beans)\n",
       "6       0.6                (Onion, Eggs)\n",
       "7       0.6         (Kidney Beans, Milk)\n",
       "8       0.6        (Onion, Kidney Beans)\n",
       "9       0.6       (Kidney Beans, Yogurt)\n",
       "10      0.6  (Onion, Kidney Beans, Eggs)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apriori(df, min_support=0.6, use_colnames=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2 -- Selecting and Filtering Results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The advantage of working with pandas `DataFrames` is that we can use its convenient features to filter the results. For instance, let's assume we are only interested in itemsets of length 2 that have a support of at least 80 percent. First, we create the frequent itemsets via `apriori` and add a new column that stores the length of each itemset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>(Kidney Beans)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Milk)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Yogurt)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs, Kidney Beans)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Eggs)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Kidney Beans, Milk)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Kidney Beans)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Kidney Beans, Yogurt)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Kidney Beans, Eggs)</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                     itemsets  length\n",
       "0       0.8                       (Eggs)       1\n",
       "1       1.0               (Kidney Beans)       1\n",
       "2       0.6                       (Milk)       1\n",
       "3       0.6                      (Onion)       1\n",
       "4       0.6                     (Yogurt)       1\n",
       "5       0.8         (Eggs, Kidney Beans)       2\n",
       "6       0.6                (Onion, Eggs)       2\n",
       "7       0.6         (Kidney Beans, Milk)       2\n",
       "8       0.6        (Onion, Kidney Beans)       2\n",
       "9       0.6       (Kidney Beans, Yogurt)       2\n",
       "10      0.6  (Onion, Kidney Beans, Eggs)       3"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)\n",
    "frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))\n",
    "frequent_itemsets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we can select the results that satisfy our desired criteria as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs, Kidney Beans)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   support              itemsets  length\n",
       "5      0.8  (Eggs, Kidney Beans)       2"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frequent_itemsets[ (frequent_itemsets['length'] == 2) &\n",
    "                   (frequent_itemsets['support'] >= 0.8) ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, using the Pandas API, we can select entries based on the \"itemsets\" column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Eggs)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   support       itemsets  length\n",
       "6      0.6  (Onion, Eggs)       2"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Frozensets**\n",
    "\n",
    "Note that the entries in the \"itemsets\" column are of type `frozenset`, which is built-in Python type that is similar to a Python `set` but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since `frozenset`s are sets, the item order does not matter. I.e., the query\n",
    "\n",
    "`frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ]`\n",
    "    \n",
    "is equivalent to any of the following three\n",
    "\n",
    "- `frequent_itemsets[ frequent_itemsets['itemsets'] == {'Eggs', 'Onion'} ]`\n",
    "- `frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Eggs', 'Onion')) ]`\n",
    "- `frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Onion', 'Eggs')) ]`\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 3 -- Working with Sparse Representations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To save memory, you may want to represent your transaction data in the sparse format.\n",
    "This is especially useful if you have lots of products and small transactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Apple</th>\n",
       "      <th>Corn</th>\n",
       "      <th>Dill</th>\n",
       "      <th>Eggs</th>\n",
       "      <th>Ice cream</th>\n",
       "      <th>Kidney Beans</th>\n",
       "      <th>Milk</th>\n",
       "      <th>Nutmeg</th>\n",
       "      <th>Onion</th>\n",
       "      <th>Unicorn</th>\n",
       "      <th>Yogurt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Apple   Corn   Dill   Eggs  Ice cream  Kidney Beans   Milk  Nutmeg  Onion  \\\n",
       "0  False  False  False   True      False          True   True    True   True   \n",
       "1  False  False   True   True      False          True  False    True   True   \n",
       "2   True  False  False   True      False          True   True   False  False   \n",
       "3  False   True  False  False      False          True   True   False  False   \n",
       "4  False   True  False   True       True          True  False   False   True   \n",
       "\n",
       "   Unicorn  Yogurt  \n",
       "0    False    True  \n",
       "1    False    True  \n",
       "2    False   False  \n",
       "3     True    True  \n",
       "4    False   False  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "oht_ary = te.fit(dataset).transform(dataset, sparse=True)\n",
    "sparse_df = pd.DataFrame.sparse.from_spmatrix(oht_ary, columns=te.columns_)\n",
    "sparse_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing 21 combinations | Sampling itemset size 3\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>(Kidney Beans)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Milk)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Yogurt)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>(Eggs, Kidney Beans)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Kidney Beans, Milk)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Kidney Beans)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Kidney Beans, Yogurt)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>(Onion, Kidney Beans, Eggs)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                     itemsets\n",
       "0       0.8                       (Eggs)\n",
       "1       1.0               (Kidney Beans)\n",
       "2       0.6                       (Milk)\n",
       "3       0.6                      (Onion)\n",
       "4       0.6                     (Yogurt)\n",
       "5       0.8         (Eggs, Kidney Beans)\n",
       "6       0.6                (Onion, Eggs)\n",
       "7       0.6         (Kidney Beans, Milk)\n",
       "8       0.6        (Onion, Kidney Beans)\n",
       "9       0.6       (Kidney Beans, Yogurt)\n",
       "10      0.6  (Onion, Kidney Beans, Eggs)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apriori(sparse_df, min_support=0.6, use_colnames=True, verbose=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## apriori\n",
      "\n",
      "*apriori(df, min_support=0.5, use_colnames=False, max_len=None, verbose=0, low_memory=False)*\n",
      "\n",
      "Get frequent itemsets from a one-hot DataFrame\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `df` : pandas DataFrame or pandas SparseDataFrame\n",
      "\n",
      "    pandas DataFrame the encoded format.\n",
      "    The allowed values are either 0/1 or True/False.\n",
      "    For example,\n",
      "\n",
      "```\n",
      "    Apple  Bananas  Beer  Chicken  Milk  Rice\n",
      "    0      1        0     1        1     0     1\n",
      "    1      1        0     1        0     0     1\n",
      "    2      1        0     1        0     0     0\n",
      "    3      1        1     0        0     0     0\n",
      "    4      0        0     1        1     1     1\n",
      "    5      0        0     1        0     1     1\n",
      "    6      0        0     1        0     1     0\n",
      "    7      1        1     0        0     0     0\n",
      "```\n",
      "\n",
      "\n",
      "- `min_support` : float (default: 0.5)\n",
      "\n",
      "    A float between 0 and 1 for minumum support of the itemsets returned.\n",
      "    The support is computed as the fraction\n",
      "    `transactions_where_item(s)_occur / total_transactions`.\n",
      "\n",
      "\n",
      "- `use_colnames` : bool (default: False)\n",
      "\n",
      "    If `True`, uses the DataFrames' column names in the returned DataFrame\n",
      "    instead of column indices.\n",
      "\n",
      "\n",
      "- `max_len` : int (default: None)\n",
      "\n",
      "    Maximum length of the itemsets generated. If `None` (default) all\n",
      "    possible itemsets lengths (under the apriori condition) are evaluated.\n",
      "\n",
      "\n",
      "- `verbose` : int (default: 0)\n",
      "\n",
      "    Shows the number of iterations if >= 1 and `low_memory` is `True`. If\n",
      "    >=1 and `low_memory` is `False`, shows the number of combinations.\n",
      "\n",
      "\n",
      "- `low_memory` : bool (default: False)\n",
      "\n",
      "    If `True`, uses an iterator to search for combinations above `min_support`.\n",
      "    Note that while `low_memory=True` should only be used for large dataset\n",
      "    if memory resources are limited, because this implementation is approx.\n",
      "    3-6x slower than the default.\n",
      "\n",
      "\n",
      "**Returns**\n",
      "\n",
      "pandas DataFrame with columns ['support', 'itemsets'] of all itemsets\n",
      "    that are >= `min_support` and < than `max_len`\n",
      "    (if `max_len` is not None).\n",
      "    Each itemset in the 'itemsets' column is of type `frozenset`,\n",
      "    which is a Python built-in type that behaves similarly to\n",
      "    sets except that it is immutable\n",
      "    (For more info, see\n",
      "    https://docs.python.org/3.6/library/stdtypes.html#frozenset).\n",
      "\n",
      "**Examples**\n",
      "\n",
      "For usage examples, please see\n",
      "    [http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.frequent_patterns/apriori.md', 'r') as f:\n",
    "    print(f.read())"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
